Statistical Machine Translation and Automatic Speech Recognition under Uncertainty

نویسنده

  • Lambert Mathias
چکیده

Statistical modeling techniques have been applied successfully to natural language processing tasks such as automatic speech recognition (ASR) and statistical machine translation (SMT). Since most statistical approaches rely heavily on availability of data and the underlying model assumptions, reduction in uncertainty is critical to their optimal performance. In speech translation, the uncertainty is due to the speech input to the SMT system whose elements are represented as distributions over sequences. A novel approach to statistical phrase-based speech translation is proposed. This approach is based on a generative, source-channel model of translation, similar in spirit to the modeling approaches that underly hidden Markov model(HMM)-based ASR systems: in fact, our model of speech-to-text translation contains the acoustic models of a large vocabulary ASR system as one of its components. This model of speech-totext translation is developed as a direct extension of the phrase-based models used in text translation systems. Speech is translated by mapping ASR word lattices to lattices of phrase sequences which are then translated using operations developed for text translation. Efficient phrase extraction from ASR lattices and word and phrase level pruning strategies for speech translation are investigated to reduce uncertainty in translation of speech. In order to achieve good translation performance it is necessary to find optimal parameters under a particular training objective. Two different discriminative training objective functions are investigated: Maximum Mutual Information (MMI) and Expected BLEU. A novel iterative optimization procedure, using growth transformations is proposed as a parameter update procedure for the training criteria. The translation performance using growth transformation based updates is investigated in detail.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic text dictation in computer-assisted translation

In this paper, we study the incorporation of statistical machine translation models to automatic speech recognition models in the framework of computer-assisted translation. The system is given a source language text to be translated and it shows the source text to the human translator to translate it orally. The system captures the user speech which is the dictation of the target language sent...

متن کامل

MISTRAL: a Statistical Machine Translation Decoder for Speech Recognition Lattices

This paper presents MISTRAL, an open source statistical machine translation decoder dedicated to spoken language translation. While typical machine translation systems take a written text as input, MISTRAL translates word lattices produced by automatic speech recognition systems. The lattices are translated in two passes using a phrase-based model. Our experiments reveal an improvement in BLEU ...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Amharic-English Speech Translation in Tourism Domain

This paper describes speech translation from Amharic-to-English, particularly Automatic Speech Recognition (ASR) with post-editing feature and AmharicEnglish Statistical Machine Translation (SMT). ASR experiment is conducted using morpheme language model (LM) and phoneme acoustic model (AM). Likewise, SMT conducted using word and morpheme as unit. Morpheme based translation shows a 6.29 BLEU sc...

متن کامل

The TÜbİTAK-UEKAE statistical machine translation system for IWSLT 2007

We describe the TÜBITAK-UEKAE system that participated in the Arabic-to-English and Japanese-toEnglish translation tasks of the IWSLT 2007 evaluation campaign. Our system is built on the open-source phrasebased statistical machine translation software Moses. Among available corpora and linguistic resources, only the supplied training data and an Arabic morphological analyzer are used in the sys...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007